32 research outputs found
Memory Consistent Unsupervised Off-the-Shelf Model Adaptation for Source-Relaxed Medical Image Segmentation
Unsupervised domain adaptation (UDA) has been a vital protocol for migrating
information learned from a labeled source domain to facilitate the
implementation in an unlabeled heterogeneous target domain. Although UDA is
typically jointly trained on data from both domains, accessing the labeled
source domain data is often restricted, due to concerns over patient data
privacy or intellectual property. To sidestep this, we propose "off-the-shelf
(OS)" UDA (OSUDA), aimed at image segmentation, by adapting an OS segmentor
trained in a source domain to a target domain, in the absence of source domain
data in adaptation. Toward this goal, we aim to develop a novel batch-wise
normalization (BN) statistics adaptation framework. In particular, we gradually
adapt the domain-specific low-order BN statistics, e.g., mean and variance,
through an exponential momentum decay strategy, while explicitly enforcing the
consistency of the domain shareable high-order BN statistics, e.g., scaling and
shifting factors, via our optimization objective. We also adaptively quantify
the channel-wise transferability to gauge the importance of each channel, via
both low-order statistics divergence and a scaling factor.~Furthermore, we
incorporate unsupervised self-entropy minimization into our framework to boost
performance alongside a novel queued, memory-consistent self-training strategy
to utilize the reliable pseudo label for stable and efficient unsupervised
adaptation. We evaluated our OSUDA-based framework on both cross-modality and
cross-subtype brain tumor segmentation and cardiac MR to CT segmentation tasks.
Our experimental results showed that our memory consistent OSUDA performs
better than existing source-relaxed UDA methods and yields similar performance
to UDA methods with source data.Comment: Published in Medical Image Analysis (extension of MICCAI paper
Posterior Estimation for Dynamic PET imaging using Conditional Variational Inference
This work aims efficiently estimating the posterior distribution of kinetic
parameters for dynamic positron emission tomography (PET) imaging given a
measurement of time of activity curve. Considering the inherent information
loss from parametric imaging to measurement space with the forward kinetic
model, the inverse mapping is ambiguous. The conventional (but expensive)
solution can be the Markov Chain Monte Carlo (MCMC) sampling, which is known to
produce unbiased asymptotical estimation. We propose a deep-learning-based
framework for efficient posterior estimation. Specifically, we counteract the
information loss in the forward process by introducing latent variables. Then,
we use a conditional variational autoencoder (CVAE) and optimize its evidence
lower bound. The well-trained decoder is able to infer the posterior with a
given measurement and the sampled latent variables following a simple
multivariate Gaussian distribution. We validate our CVAE-based method using
unbiased MCMC as the reference for low-dimensional data (a single brain region)
with the simplified reference tissue model.Comment: Published on IEEE NSS&MI
Synthesizing audio from tongue motion during speech using tagged MRI via transformer
Investigating the relationship between internal tissue point motion of the
tongue and oropharyngeal muscle deformation measured from tagged MRI and
intelligible speech can aid in advancing speech motor control theories and
developing novel treatment methods for speech related-disorders. However,
elucidating the relationship between these two sources of information is
challenging, due in part to the disparity in data structure between
spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio
waveforms. In this work, we present an efficient encoder-decoder translation
network for exploring the predictive information inherent in 4D motion fields
via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder
is based on 3D convolutional spatial modeling and transformer-based temporal
modeling. The extracted features are processed by an asymmetric 2D convolution
decoder to generate spectrograms that correspond to 4D motion fields.
Furthermore, we incorporate a generative adversarial training approach into our
framework to further improve synthesis quality on our generated spectrograms.
We experiment on 63 paired motion field sequences and speech waveforms,
demonstrating that our framework enables the generation of clear audio
waveforms from a sequence of motion fields. Thus, our framework has the
potential to improve our understanding of the relationship between these two
modalities and inform the development of treatments for speech disorders.Comment: SPIE Medical Imaging: Deep Dive Ora
Multimodal Data Integration for Computer-Aided Ablation of Atrial Fibrillation
Image-guided percutaneous interventions have successfully replaced invasive surgical methods in some cardiologic practice, where the use of 3D-reconstructed cardiac images, generated by magnetic resonance imaging (MRI) and computed tomography (CT), plays an important role. To conduct computer-aided catheter ablation of atrial fibrillation accurately, multimodal information integration with electroanatomic mapping (EAM) data and MRI/CT images is considered in this work. Specifically, we propose a variational formulation for surface reconstruction and incorporate the prior shape knowledge, which results in a level set method. The proposed method enables simultaneous reconstruction and registration under nonrigid deformation. Promising experimental results show the potential of the proposed approach
Bias and Fairness in Chatbots: An Overview
Chatbots have been studied for more than half a century. With the rapid
development of natural language processing (NLP) technologies in recent years,
chatbots using large language models (LLMs) have received much attention
nowadays. Compared with traditional ones, modern chatbots are more powerful and
have been used in real-world applications. There are however, bias and fairness
concerns in modern chatbot design. Due to the huge amounts of training data,
extremely large model sizes, and lack of interpretability, bias mitigation and
fairness preservation of modern chatbots are challenging. Thus, a comprehensive
overview on bias and fairness in chatbot systems is given in this paper. The
history of chatbots and their categories are first reviewed. Then, bias sources
and potential harms in applications are analyzed. Considerations in designing
fair and unbiased chatbot systems are examined. Finally, future research
directions are discussed
Successive Subspace Learning for Cardiac Disease Classification with Two-phase Deformation Fields from Cine MRI
Cardiac cine magnetic resonance imaging (MRI) has been used to characterize
cardiovascular diseases (CVD), often providing a noninvasive phenotyping
tool.~While recently flourished deep learning based approaches using cine MRI
yield accurate characterization results, the performance is often degraded by
small training samples. In addition, many deep learning models are deemed a
``black box," for which models remain largely elusive in how models yield a
prediction and how reliable they are. To alleviate this, this work proposes a
lightweight successive subspace learning (SSL) framework for CVD
classification, based on an interpretable feedforward design, in conjunction
with a cardiac atlas. Specifically, our hierarchical SSL model is based on (i)
neighborhood voxel expansion, (ii) unsupervised subspace approximation, (iii)
supervised regression, and (iv) multi-level feature integration. In addition,
using two-phase 3D deformation fields, including end-diastolic and end-systolic
phases, derived between the atlas and individual subjects as input offers
objective means of assessing CVD, even with small training samples. We evaluate
our framework on the ACDC2017 database, comprising one healthy group and four
disease groups. Compared with 3D CNN-based approaches, our framework achieves
superior classification performance with 140 fewer parameters, which
supports its potential value in clinical use.Comment: ISBI 202
Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix Factorization via Plastic Transformer
The tongue's intricate 3D structure, comprising localized functional units,
plays a crucial role in the production of speech. When measured using tagged
MRI, these functional units exhibit cohesive displacements and derived
quantities that facilitate the complex process of speech production.
Non-negative matrix factorization-based approaches have been shown to estimate
the functional units through motion features, yielding a set of building blocks
and a corresponding weighting map. Investigating the link between weighting
maps and speech acoustics can offer significant insights into the intricate
process of speech production. To this end, in this work, we utilize
two-dimensional spectrograms as a proxy representation, and develop an
end-to-end deep learning framework for translating weighting maps to their
corresponding audio waveforms. Our proposed plastic light transformer (PLT)
framework is based on directional product relative position bias and
single-level spatial pyramid pooling, thus enabling flexible processing of
weighting maps with variable size to fixed-size spectrograms, without input
information loss or dimension expansion. Additionally, our PLT framework
efficiently models the global correlation of wide matrix input. To improve the
realism of our generated spectrograms with relatively limited training samples,
we apply pair-wise utterance consistency with Maximum Mean Discrepancy
constraint and adversarial training. Experimental results on a dataset of 29
subjects speaking two utterances demonstrated that our framework is able to
synthesize speech audio waveforms from weighting maps, outperforming
conventional convolution and transformer models.Comment: MICCAI 2023 (Oral presentation
DRIMET: Deep Registration for 3D Incompressible Motion Estimation in Tagged-MRI with Application to the Tongue
Tagged magnetic resonance imaging (MRI) has been used for decades to observe
and quantify the detailed motion of deforming tissue. However, this technique
faces several challenges such as tag fading, large motion, long computation
times, and difficulties in obtaining diffeomorphic incompressible flow fields.
To address these issues, this paper presents a novel unsupervised phase-based
3D motion estimation technique for tagged MRI. We introduce two key
innovations. First, we apply a sinusoidal transformation to the harmonic phase
input, which enables end-to-end training and avoids the need for phase
interpolation. Second, we propose a Jacobian determinant-based learning
objective to encourage incompressible flow fields for deforming biological
tissues. Our method efficiently estimates 3D motion fields that are accurate,
dense, and approximately diffeomorphic and incompressible. The efficacy of the
method is assessed using human tongue motion during speech, and includes both
healthy controls and patients that have undergone glossectomy. We show that the
method outperforms existing approaches, and also exhibits improvements in
speed, robustness to tag fading, and large tongue motion.Comment: Accepted to MIDL 2023 (full paper